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Cooperativity  and  3-D  Representation 


Objectives 

Our  work  has  concentrated  on  how  2-D  information  is  built  up  from  the  parallel 
analysis  of  a  set  of  visual  attributes  and  how  this  information  contacts  memory  in  order  to 
construct  3-D  representations  of  the  visual  scene.  We  are  interested  in  the  coding  of 
image  contours  and  how  they  arise  from  the  various  attributes  which  can  define  contours. 
We  have  examined  the  decomposition  of  image  values  into  object  features  (reflectance, 
orientation,  3D  position)  and  illumination  features  (shadows,  shading,  highlights)  and 
especially  how  the  perception  of  transparency  leads  to  the  distribution  of  image  values  to 
two  or  more  superimposed  surfaces.  Finally,  we  have  studied  the  initial  contact  between 
the  image  contours  and  memory  in  recognition. 

Progress  Report 

Several  projects  dealing  with  the  contribution  of  early  streams  to  a  common 
representation  have  been  completed  during  the  grant  period.  These  will  be  described  first 
followed  by  the  results  in  our  visual  search  explorations  of  coding  in  early  processing  and 
finally  by  the  priming  results  for  the  early  2-D  match  experiments. 

1.  Cross-media  cooperation  in  contour  localization.  If  different  attributes  such 
as  color,  motion,  luminance,  and  texture  are  first  analyzed  independently,  their 
representations  must  subsequently  be  recombined  in  some  manner.  In  order  to  determine 
the  nature  of  this  recombination,  we  completed  several  studies  using  a  vernier  acuity  task. 
Some  studies  have'  been  published  on  vernier  tasks  for  media  other  than  luminance.  In 
particular,  Regan  (1986)  has  reported  that  the  alignment  of  motion-defined  edges  is  as 
accurate  as  that  for  luminance-defined  edges  and  Farell  and  Krauskopf  (1989)  make  the 
same  point  for  vernier  alignment  of  low-pass,  equiluminous,  chromatic  bars. 


a 


Align  bar  to  the 
apparent  position  of  the 
upper  border 


b 


Fig.  6a.  Vernier  alignment  task  where  the  upper  border  is  a  discontinuity  defined  in  one 
or  several  media  (color  and  motion  depicted  here),  b)  Vernier  alignment  of  one  border  in 
the  presence  of  a  second  border  defined  in  a  different  medium. 
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In  the  first  experiment,  a  venical  border  was  presented  in  the  upper  field  defined 
in  by  one  attribute  (e.g.  green  on  the  left,  white  on  the  right,  both  areas  filled  with  a  fine 
dynamic  luminance  texture)  or  several  as  shown  in  Figure  6a  (e.g.  green  with  texture 
moving  up  on  the  left,  white  with  dynamic  texture  having  no  coherent  motion  on  the 
right).  The  lower  field  was  white  with  a  fine,  vertical,  black  line  whose  position  was 
randomly  offset  relative  to  that  of  the  upper  border.  Observers  reported  whether  the  lower 
line  fell  to  the  left  or  right  of  the  upper  border  in  forced  choice  task.  The  magnitude  of  the 
color,  luminance,  or  texture  discontinuity  defining  the  border  was  adjusted  to  produce 
approximately  equal  precision  of  localization  for  each  attribute  when  presented  alone. 
The  result  of  interest  was  the  effect  of  defining  the  border  by  discontinuities  in  several 
attributes  simultaneously  (two  or  three  at  a  time).  The  degree  of  improvement  of 
localization  (the  JND  of  the  psychometric  function)  indicated  that  all  attributes  tested 
(color,  luminance,  and  texture)  contributed  equally  to  the  improvement  when  they  were 
paired.  Moreover,  the  improvement  was  greater  than  could  be  expected  from  an 
independent  localization  of  the  discontinuities  in  each  attribute  followed  by  probability 
summation  of  the  decisions  comparing  each- of  those  locations  to  the  test  line  below. 
These  results  indicated  that  some  graded  information  from  each  attribute  was  being 

combined  before  the  localization  decision 
was  made. 

The  next  experiment  demonstrated 
that  this  graded  information  was  a  profile  of 
activity  representing  the  discontinuity  and 
that  these  profiles  were  summed  into  a 
common  representation,  independently  of  the 
attributes  defining  the  discontinuities. 
Westheimer  and  his  colleagues  (eg,  Badcock 
&  Westheimer,  1985)  had  earlier  shown  that 
adjacent  luminance  contours  interact, 
attracting  or  repelling  each  other  depending 
on  their  spacing.  These  shifts  could  be  accurately  modeled  by  summing  the  profiles  of 
activity  of  cortical  cells  to  each  line.  The  peaks  of  these  summed  distributions  shifted 
towards  each  other  or  away  in  a  manner  consistent  with  the  psychophysical  results. 
Assuming  that  each  line’s  position  was  based  on  the  peak  of  the  profile  of  neural  activity 
in  response  to  the  line,  this  result  established  that  the  two  luminance  contours  could 
interact  through  a  common,  spatially  extended  representation.  We  extended  these 
experiments  to  measure  the  interactions  between  contours  defined  by  different  attributes. 
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Fig.  7.  Apparent  shift  of  test  line  caused 
by  the  adjacent  flanking  line.  The 
pattern  was  similar  for  all  pairings  of 
test  and  flanking  attributes. 
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The  displacement  of  the  apparent  border  position  was  measured  as  a  function  of  the 
position  of  the  neighboring  border  defined  by  a  different  attribute  (see  Figure  6b,  again 
after  adjusting  the  magnitude  of  the  border  discontinuity  in  each  attribute  so  that 
localization  was  equally  precise  when  the  border  was  presented  in  each  attribute  alone). 
The  results  (see  Fig.  7  for  the  average  position  shift)  showed  that  basically  all  contours 
interacted  in  a  similar  manner  suggesting  a  common  final  representation  for  contours 
independent  of  the  attribute  that  defines  them.  For  example,  a  color  contour  was  strongly 
attracted  toward  an  adjacent  luminance  contour  if  the  two  were  within  10  min  of  arc  of 
each  other.  The  reverse  was  also  true,  a  luminance  contour  was  attracted  towards  an 
adjacent  color  contour.  This  pattern  of  attraction  (repulsion  was  much  weaker)  again 
suggests  the  summing  of  profiles  of  activity  as  originally  proposed  by  Badcock  and 
Westheimer  (1985)  but  now  the  profiles  ^  .iginate  from  the  analyses  of  different  attributes 
and  they  must  be  summed  in  some  attribute-invariant,  final  representation.  These  results 
challenge  the  models  of  Gregory  (1979)  and  Grossberg  and  Mingolla  (1985)  that  predict 
that  luminance  is  a  priviledged  signal  for  contours. 

This  work  was  presented  at  ARVO  in*1991  and  1992  and  manuscripts  are  being 
prepared  for  publication.  Josde  Rivest  has  returned  to  our  lab  this  summer  to  work  on 
these  manuscripts  and  two  additional  experiments. 

2.  Early  visual  memory:  persistence.  With  Dr.  Satoshi  Shioiri  at  ATR  in  Japan, 
we  investigated  the  memory  for  spatial  position  provided  by  luminance  and  by  relative 
motion  (Shioiri  &  Cavanagh,  1992).  The  early  visual  memory  that  we  tested  is  the 
decaying  visible  trace  of  the  stimulus.  A  partial  (4x4)  matrix  technique  developed  by 
DiLollo  (1977)  was  used  where  the  first  partial  set  of  elements  precedes  the  second 
partial  set  by  a  short  interval.  One  element  is  missing  from  the  combined  representation 
of  the  two  partial  matrices.  Because  of  the  large  number  of  matrix  positions  (16  in  our 
experiments),  the  missing  element  can  only  be  identified  if  the  image  of  the  first  partial 
matrix  persists  until  the  presentation  of  the  second.  The  two  are  then  effectively 
superimposed  and  the  missing  element  “pops  out”.  We  found  that  pattern  features  defined 
by  relative  motion  did  exhibit  visual  persistance  for  durations  similar  to  that  for 
luminance-defined  patterns.  Although  we  had  too  little  information  to  draw  a  final 
conclusion,  it  is  possible  that  there  is  one  site  for  visual  persistance  that  follows  the 
attribute-specific  detectors  and  that  is  largely  attribute  independent. 

3.  Monocular  depth  cues.  Our  earlier  work  on  size  and  tilt  aftereffects 
(Cavanagh,  1989;  Favreau  &  Cavanagh,  1981;  Flanagan,  Cavanagh  &  Favreau,  1990) 
gave  no  evidence  of  special  primacy  or  privilege  for  luminance  information  other  than  the 
extra  resolution  it  affords.  The  same  conclusion  also  appears  to  hold  for  the  analysis  of  3- 
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D  shape  (Cavanagh,  1987,  1988).  These  results  are  in  opposition  to  the  claims  of 
Livingstone  and  Hubei  (1987)  that  several  monocular  depth  cues  are  ineffective  when 
presented  in  equiluminous,  chromatic  stimuli.  However,  the  loss  of  depth  in  their  stimuli 
was  likely  due  to  the  fine  detail  in  their  figures  which  could  only  be  clearly  resolved 
(especially  in  the  periphery)  if  luminance  was  present.  At  equiluminance,  therefore,  both 
the  stimulus  and  the  depth  were  difficult  to  see.  In  a  continuation  of  these  studies,  Lee 
Zimmerman,  Gordon  Legge,  and  I  evaluated  the  efficiency  of  perspective  cues  for  surface 

slant  when  a  tilted  plane  is  defined  in 


Standard 


Fig.  8,  Tests  of  perspective  cues  to  slant. 
One  bar  was  adjusted  until  it  has  the  same 
apparent  length  as  the  standard 


luminance  as  compared  to  when  it  is 

Standard  defined  in  color  (Zimmerman,  Legge,  & 

Cavanagh,  submitted  to  JOSA). 

T-shaped  probes  (Fig.  8)  were 

placed  on  the  tilted  surface  and  the 

,  ,  observer  adjusted  the  relative  lengths  of 

Fig.  8,  Tests  of  perspective  cues  to  slant. 

One  bar  was  adjusted  until  it  has  the  same  the  lines  until  they  appeared  to  have 
apparent  length  as  the  standard  .  equal  length.  We  assumed  that  the 

greater  the  perceived  slant  of  the  plane, 
the  shorter  the  setting  of  the  adjustment  line  (foreshortening).  Several  different  simulated 
slants  were  presented  by  changing  the  shape  of  the  trapezoid  and  the  perceived  slant  was 
determined  from  the  amount  of  foreshortening  in  the  adjustment.  The  judgments  of  slant 
impression  were  accurate  to  with  3%  of  the  veridical  value  in  both  the  equiluminous  case 
and  the  luminance-defined  case.  These  results  imply  that  the  extraction  of  these 
perspective  cues  operates  at  a  level  following  the  integration  of  the  different  attributes 
into  a  common  representation. 

4.  Transparency.  Takeo  Watanabe  and  I  have  examined  the  surface 
decomposition  accompanying  transparency  perception.  We  show  that  the  surface 
Valid  for  luminance  transparency  Invalid  for  luminance  transparency 


Fig.  9.  Transparency  of  stimuli  evaluated  using  a  recognition  task.  Recognition  is  easier 
for  the  stimuli  on  the  left  than  for  those  on  the  right. 
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decomposition  occurs  rapidly,  it  affects  even  early  stages  of  visual  processing,  and  it 
involves  attributes  such  as  texture  and  motion  as  well  as  color  and  brightness  (see 
Watanabe  &  Cavanagh,  1993a,  for  a  review).  In  the  first  experiment  (Watanabe  & 
Cavanagh,  1992b),  we  developed  and  validated  a  measure  of  functional  transparency  that 
reduces  the  variability  of  traditional  subjective  report  techniques.  The  method  is  a 
reaction-time  version  of  a  technique  originally  described  by  de  Weert  (1986).  Using  four 
overlapped  digits  (Figure  9)  and  varying  the  luminance  of  the  overlap  region  we  found 
that  faster  reaction  times  are  obtained  for  stimuli  with  luminances  valid  for  transparency 
than  for  stimuli  with  invalid  luminances  (while  holding  border  contrast  magnitudes 
constant).  This  advantage  became  apparent  with  as  little  as  60  msec  of  exposure  of  the 
digits.  Because  the  technique  requires  only  a  simple  recognition  response  by  the  subject, 
it  has  been  adopted  in  at  least  two  alert  primate  labs  as  a  method  of  assessing 
transparency  perception  in  animals. 

In  a  second  experiment  (Watanabe,  Zimmerman,  &  Cavanagh,  1992),  the 
separation  of  overlying,  orthogonal  grids  due  to  transparency  was  found  to  influence  the 
strength  of  the  McCollough  effect,  an  effect  attributed  to  early  cortical  processing. 
Following  adaptation  to  traditional  vertical  red  and  horizontal  green  gratings,  observers 
were  shown  overlapping  horizontal  and  vertical  grids  which  could  either  look  like 
overlapping  sets  of  gratings  if  they  were  seen  as  transparent  or  as  a  checkerboard  pattern 
if  transparency  was  not  seen.  The  orientation-contingent  color  aftereffect  was 
significantly  larger  when  the  grids  appeared  to  be  transparent. 

In  a  third  experiment  (Watanabe,  &  Cavanagh,  1991),  we  showed  that 
transparency  involves  not  only  surface  color  or  brightness  but  also  texture  and  motion. 
When  a  transparent  surface  appears  to  extend  over  areas  that  are  physically  identical  to 
the  background,  we  found  that  the  texture  and  motion  qualities  of  the  transparent  overlay 
also  appear  to  fill  the  illusory  overlying  region. 

In  two  other  experiments,  we  examined  the  nature  of  contours  and  contour 
junctions  involved  in  transparency.  First  (Watanabe  &  Cavanagh,  1992a),  we  discovered 
that  a  transparent  overlay  can  capture  some  of  the  features  of  surfaces  visible  beneath  it. 
Typically,  transparency  involves  seeing  two  superimposed  surfaces  at  different  depths. 
However,  if  the  contours  of  the  further  surface  (as  specified  by  binocular  disparity)  are 
defined  by  equiluminous  color  or  are  illusory,  the  colored  or  illusory  figures  appear  to  lie 
on  the  front,  transparent  surface  (note  that  in  the  latter  case,  the  inducing  figure  was  still 
perceived  to  lie  in  the  rear  plane).  The  ability  to  appreciate  a  depth  separation  between 
transparent  surfaces  may  therefore  require  the  presence  of  explicit  luminance  contours. 
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In  another  experiment  (Watanabe  &  Cavanagh,  1993b),  we  discovered  that  a  T- 
junction  is  not  an  invariant  cue  to  occlusion  but  can  signal  transparency.  Typically, 
transparent  surfaces  are  signaled  by  X-junctions  where  the  contour  of  the  overlying 
transparent  surfaces  crosses  over  contours  of  the  surface  below.  There  is,  however,  a 
special  case  where  a  surface  can  return  the  same  amount  of  light  to  the  observer  whether 
viewed  directly  or  viewed  through  a  overlying,  transparent  surface  (ie,  when  the  light 
reflected  from  the  transparent  surface  equals  the  light  loss  in  transmission  through  it).  In 
this  case  there  no  border  is  visible  between  the  transparent  surface  and  the  underlying 
surface.  Atypical  X-junction  in  this  case  becomes  a  T-junction  and  several  demonstration 
images  showed  that  observers  can  see  these  instances  as  transparent.  Transparency  is  not 
always  the  initial  impression  but  it  appears  always  to  be  a  possible  percept. 

5.  Visual  search:  level  of  representation.  Recent  studies  of  visual  search  have 
demonstrated  rapid  processing  for  intermediate-  to  high-level  attributes  such  as  3-D 
surfaces  (He  &  Nakayama,  1992),  orientation  of  3-D  objects  (Enns  &  Rensink,  1991; 
1990b),  shadows  (Rensink  &  Cavanagh,  1993),  and  stimulus  familiarity  (Wang  & 
Cavanagh,  1992).  The.se  studies  suggest  that  the  visual  system  can  process  different 
levels  of  image  representation  during  rapid  pattern  discrimination,  Satoru  Suzuki  and  I 
(Suzuki  &  Cavanagh,  1992)  examined  the  case  where  visual  search  had  a  unique  target 
distinguished  by  both  low  and  high  levels  of  representation,  '''he  low-level  attribute  used 
was  curvature  and  the  high-level  attribute  used  was  facial  expression.  We  found  that 
visual  search  operates  only  on  a  high-level  represcnlation  and  that  low-level 
representations  are  no  longer  accessible  to  visual  search  processes  when  their  components 
become  integrated  into  higher-level  representations. 

Four  visual  search  tasks  were  designed  using  1)  feature  vs  conjunction  targets  and 
2)  the  presence/absence  of  facial  organization  as  the  two  variables.  In  Figure  10,  each 
quadrant  shows  an  example  of  a  stimulus  array  containing  one  target  and  five  distractors. 
The  target  pattern  consists  of  identical  elements  for  all  stimulus  arrays  (one  down  arc  and 
two  up  arcs  bounded  within  a  circle).  The  bull's-eye  at  the  center  is  the  fixation  point. 

In  the  feature  search,  the  target  contained  the  sole  downward  arc  in  the  display;  in 
the  conjunction  search,  targets  and  distractors  had  both  upward  and  downward  arcs  and 
were  distinguished  only  by  their  spatial  arrangement.  When  there  was  no  facial 
organization,  the  arcs  were  arranged  with  one  on  the  left  and  two  on  the  right.  When 
facial  organization  was  imposed,  the  triplets  of  arcs  were  rearranged  to  suggest  two  eyes 
and  a  mouth.  In  this  case  the  the  target  was  always  smiling  and  the  distractors  always 
frowning. 
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Fig.  10.  Search  stimuli  defined  either  a  feature  or  conjunction  search  with  or  without  a 
facial  organization.  When  the  face  organization  was  present,  search  could  be  based  either 
on  low  level  curvature  features  or  on  high  level  faci^  features  (expression).  The  target  is 
always  the  rightmost  element  for  convenience  here.  In  the  actual  stimuli,  all  arcs  were 
identical  and  diffeired  only  in  direction  of  curvature. 

The  search  rate  in  the  conjunction  search  without  facial  features  was  quite  slow, 
consistent  with  Treisman’s  conjecture  that  serial  attention  is  required  to  conjoin  the 
different  features  of  the  target.  When  these  stimuli  were  arranged  in  facial  expressions, 
search  rates  speeded  up,  suggesting  the  expected  advantage  of  familiarity.  Search  rates 
for  the  feature  condition  without  facial  organization  were  rapid,  again  as  expected  for  a 
feature  search.  On  the  other  hand,  when  the  facial  organization  was  imposed  on  these 
stimuli,  search  slowed  down. 

The  data  suggest  that  facial  organization  preempts  curvature  features,  making  the 
low-level  curvature  features  “invisible”  to  the  search  process  when  they  are  parts  of  a 
facial  organization.  This  “object  inferiority  effect”  implies  that  search  processes  do  not 
have  access  to  all  levels  of  representation  but  are  restricted  to  a  particular  high-level 
representation,  high  enough  for  the  familiarity  of  faces  to  have  direct  influence  on  search 
rates.  Moreover,  if  the  stimuli  reach  the  “searchable”  level  as  a  complex  gestalt  such  as  a 
face,  die  search  is  obliged  to  operate  on  those  representations  even  if  lower  levels  of 
coding,  for  example,  the  curves  within  the  face,  would  offer  much  faster  processing. 
These  results  are  consistent  with  those  of  He  and  Nakayama  (1992)  who  showed  that 
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when  local  features  are  integrated  into  surface  representations,  the  local  features  are  no 
longer  available  to  visual  search. 


6.  Visual  search:  spatial  coordinates.  Satoru  Suzuki  and  I  (Suzuki  &  Cavanagh, 
1993)  also  examined  whether  pattern  discrimination  learning  is  specific  to  the  location  of 
the  target  in  retinal  coordinates  or  to  the  location  of  the  target  within  the  array  of 
distractors.  The  first  experiment  was  a  threshold  task  and  the  second  was  a  reaction  time 
task.  In  the  training  phase  of  the  first  experiment,  the  target,  S,  was  placed  at  location 
(2,2)  in  the  upper  left  comer  of  a  6x6  rectangular  array  of  S’s  and  the  array  was 
positioned  such  that  the  S  fell  randomly  on  one  of  four  training  locations  (Figure  11a). 
The  task  was  to  decide  if  the  flashed  array  contained  the  target.  Presentation  duration  was 
varied  between  50  and  250  msec.  Following  20  days  of  2400  trials  per  day,  threshold 


a)  Training  b)  Object-centered 

•  • 


~h  •  + 


c)  Retinotopic 


Fig.  11.  (a)  Observers  were  trained  to  detect  the  sideways  10  which  always 
appeared  at  the  same  location  within  the  array  of  2’s.  The  four  large  dots  indicate 
the  four  training  positions  for  the  target  (the  array  moved  with  the  target).  To  test 
transfer  of  training,  the  target  locations  were  eidier  (b)  kept  the  same  within  the 
array  but  moved  to  new  retinal  positions,  or  (c)  kept  at  the  same  retinal  positions 
but  moved  to  a  new  array  location. 

durations  for  75%  correct  responses  dropped  from  190  msec  to  120  msec.  The  specificity 
of  this  learning  was  tested  by  measuring  thresholds  with  modified  stimulus  arrays  and/or 
locations:  retinotopic  specificity  was  tested  by  rotating  the  four  target  locations  by  45® 
(Figure  1  lb),  and  object-centered  specificity  by  moving  the  target  item  into  the  lower 
right  comer  of  the  array  (Figure  1  Ic)  but  keeping  the  retinal  locations  of  the  targets  the 
same  as  during  training.  The  results  showed  that  learning  was  both  retinotopic  and  object- 
centered.  The  learning  persisted  undiminished  over  four  weeks.  In  contrast,  extended 
practice  on  an  analogous  reaction  time  task  using  identical  stimuli  suggested  that  learning 
for  this  task  was  object-centered  but  not  retinotopic  and  also  rather  short  term  (reaching 
plateau/decaying  within  hours).  A  manuscript  is  in  preparation. 
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7.  Object  features  and  scene  attributes  in  visual  search.  Ron  Rensink  and  I 
(Rensink  &.  Cavanagh,  1  ^93)  have  shown  that  shadows  are  explicitly  analyzed  at  the 
level  of  rapid  visual  search.  The  search  stimuli  included  objects  with  typical  cast  shadows 
and  others  with  similarly  shaped  regions  which  were  inappropriate  for  shadows  (Figure 
12).  We  reasoned  that  shadow  regions  need  to  be  rapidly  identified  and  suppressed  (and 
perhaps  attributed  to  the  background)  so  that  shadow  regions  may  be  less  available  to 
search  processes.  The  anomalous  regions  would  not  be  suppressed,  however,  and  should 
be  quite  noticeable.  This  suppression  of  shadow  regions  should  lead  to  an  interesting 
asymmetry  in  visual  search  rates.  Specifically,  when  the  target  contains  a  region 
interpreted  as  a  shadow  and  the  distractors  do  not,  the  target  will  be  defined  largely  by  the 
absence  of  a  (suppressed)  region,  and  so  will  be  difficult  to  detect.  Switching  the  items 


Fig.  12.  Shadow  search  asymmetry,  (a)  When  the  target  has  an  anomalous  shadow  or 
non-shadow,  it  stands  out,  possibly  because  the  normal  shadows  are  suppressed  or 
subsumed  into  the  background  surface,  (b)  When  the  target  has  a  normal  shadow,  it  is 
very  hard  to  locate,  again  because  the  task  requires  search  for  a  suppressed  feature,  (c) 
Whenever  the  shadows  are  manipulated,  as  in  these  three  examples,  to  block  their 
inteipretation  as  a  shadow,  the  search  asymmetry  disappears. 

used  for  target  and  distractor,  however,  will  lead  to  fast  search,  as  the  target  now  contains 
something  (i.e.,  an  unsuppressed  region)  not  present  in  the  distractors  (Treisman  & 
Gormican,  1988).  The  results  supported  this  line  of  reasoning  in  that  search  was  fast  for 
an  object  with  an  anomalous  “shadow”  among  objects  with  typical  shadows  (Fig.  12a), 
but  slow  for  the  object  with  a  typical  shadow  among  objects  with  anomalous  shadows 
(Fig.  12b).  Various  image  manipulations  which  rendered  the  shadow  areas  inappropriate 
without  affecting  image  geometry  (eg,  white  “shadow”  instead  of  black  or  outlining  the 
shadow  with  a  thin  white  contour.  Fig.  12c)  also  eliminated  the  search  asymmetry 
suggesting  that  it  was  specifically  the  “shadowness”  of  the  attached  dark  regions  which 
was  producing  the  interesting  asymmetry.  The  general  pattern  of  asymmetries  in  search 
speed  also  differed  from  that  for  shaded  objects,  implying  separate  handling  of  shadow- 
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based  and  shading-based  processes  at  early  levels.  As  a  final  point  from  this  research,  the 
asymmetry  in  search  rates  disappeared  if  the  displays  were  presented  upside  down  (Fig. 
12  with  the  page  turned  upside  down),  implying  that  the  dark  regions  were  interpreted  as 
shadows  only  if  the  light  appeared  to  come  from  above.  In  natural  scenes,  light  can  come 
from  below  and,  with  unlimited  viewing,  leads  to  appropriate  identification  of  shadows. 
This  flexibility  for  illuminant  direction  with  unlimited  viewing  appears  to  be  sacrificed  in 
rapid  visual  search  perhaps  as  one  strategy  for  achieving  high  processing  speeds. 

One  of  the  manipulations  that  eliminated  the  shadow-based  asymmetry  in  our 
preliminary  studies  was  to  place  a  short  white  contour  along  the  cast  shadow  border  (see 
Figure  12c  top).  This  feature  appeared  to  be  a  highlight  reflection  along  the  edge  of  the 
region  indicating  that  it  was  an  actual  piece  of  material  with  some  thickness,  strongly 
ruling  out  a  shadow  interpretation  for  the  region. 

We  believe  that  our  results  with  shadows  are  just  one  example  of  feature 
suppression  by  the  visual  system.  We  know  that  shadows  must  be  rapidly  identified  and 
suppressed  for  object  contours  to  be  processed.  We  predict  that  other  image  features  like 
the  brightness  patterns  of  highlights  and  shading  may  suffer  a  similar  fate.  That  is,  they 
are  identified  as  features  of  the  scene  lighting  and  become  detached  from  the  object  and 
discounted  or  suppressed.  They  leave  residual  effects  in  the  interpretation  of  the  object’s 
surface  material  but  are  not  easily  accessed  to  determine  their  actual  brightness  or  shape. 
This  approach  will  be  continued  with  further  studies  on  rapid  analysis  of  lighting  and 
object  features  using  in  particular  a  new  variation  of  the  search  task  (see  Research 
Methods) 

Initial  tests  of  rapid  processing  of  transparency  have  supported  our  earlier  work 
(Watanabe  &  Cavanagh,  1992b)  showing  that  at  least  some  aspects  of  transparency  are 
available  at  the  level  of  visual  search.  Here  again  we  believe  that  brightness  patterns  of 
transparency  may  be  suppressed  —  specifically  in  the  overlapped  region  where  the 
brightness  is  parcelled  out  to  the  two  separate  surfaces  and  the  initial  brightness  of  the 
region  may  no  longer  be  accessible. 

8.  Object  recognition:  priming.  In  our  model  (Cavanagh,  1991),  recognition 
starts  with  an  initial,  crude  2-D  match  that  selects  a  “best”  prototype  to  explain  the  image 
data.  This  is  followed  by  more  sophisticated  3-D  analyses  to  complete  the  recognition 
process.  Our  first  experiment  showed  a  priming  effect  of  contours  in  recognition  even 
though  many  of  the  contours  alone  were  uninformative  for  the  task.  '”his  priming  is 
probing  an  early  (about  100  to  200  msec  into  processing)  2-D  stage  of  recognition. 
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This  suggests  that  a  contour  outline  of  a  shadowed  image  should  be  able  to 
initiate  the  recognition  processes  that  coula  be  appropriately  completed  if  the  filled  image 
were  then  substituted.  In  our  experiment,  the  prime  is  the  full  contour  of  the  image  (eg. 
Fig  13  top  left)  and  is  present  for  a  variable  duration  before  the  presentation  of  the  filled 
image  (eg.  Fig  13  top  right).  We  are  interested  in  how  this  contour  can  facilitate  the 

recognition  process.  Although  this 
contour  version  may  not  look  like  a 
face  at  all,  its  analysis  should  proceed 
in  a  similar  manner  to  that  for  the 
filled  face  up  to  the  point  where  the 
image  is  checked  for  consistency  with 
the  matched  prototype  (the  point  at 
which  acceptable  interpretations  must 
be  found  for  image  contours  that  did 
not  participate  in  the  match).  If  this 
stage  begins  while  the  contour 
representation  is  still  present  then  the 
advantages  of  the  prime  would  be  lost 
—  many  of  the  unexplained  image 
contours  will  be  cast  shadow  borders 
(Figure  13,  bottom  right)  and  the 
regions  within  these  borders  will  not 
be  appropriately  dark.  The  face 
processing  gains  all  lost.  If  the  tilled 
version  is  inserted  in  time,  however,  the  processing  gains  should  be  retained. 

We  did  not  want  to  directly  ask  the  observers  to  recognize  the  faces,  however. 
Although  many  of  the  contour  versions  of  the  faces  are  hard  to  recognize  on  their  own  (as 
faces),  many  others  clearly  look  like  faces.  We  therefore  devised  a  new  task  where  the 
contour  itself  could  give  no  advance  information  relevant  to  the  task.  In  our  task,  the 
tilled  image  is  presented  in  either  positive  or  negative  contrast  and  the  observer’s  task  is 
to  report  which  has  been  presented  (Figure  14).  When  the  image  is  positive,  the  observer 
can  quickly  identify  it  as  as  face  (all  the  stimuli  were  faces)  and  so  knows  that  it  is  a 
positive  version.  Since  we  felt  that  the  response  is  mediated  by  recognition,  we  assumed 
that  the  priming  could  play  a  role  in  speeding  up  the  recognition.  When  the  image  was 
negative,  it  was  hard  to  recognize  and  often  seemed  a  jumble  of  parts.  Subjective  reports 


Full  contour 


Filled  image 


Attached  shadow  and 
external  contours 


Cast  shadow 
contours 


Fig.  13.  The  contour  of  a  shadowed  object  may 
be  much  more  difficult  to  interpret  than  the 
tilled,  high-contrast  image  but  it  contains  a 
highly  informative  subset  of  contours  which  can 
support  a  match  of  the  full  contour  to  2-D 
prototypes.  The  cast  shadow  contours  are  the 
reason  the  full  contour  is  difficult  to  interpret. 


prototype  would  be  rejected  and  the  potential 
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indicated  that  this  lack  of  a  coherent 
organization  mediated  the  negative 
responses  and  since  they  were  not 
based  on  recognition,  we  assumed  that 
priming  would  not  play  a  role. 

The  experiment  was  based  on 
96  faces  seen  by  24  subjects.  To  avoid 
obvious  cues  to  the  contrast,  some 
faces  liad  light  surrounds,  others  dark, 
some  had  dark  hair,  others  had  light 
hair  (wigs  were  used),  some  were 
shadowed  on  the  left,  others  on  the  right.  No  subject  saw  the  same  face  more  than  once. 
The  contours  of  the  prime  and  the  subsequently  presented  filled  version  were  physically 
identical  in  shape  and  location  in  the  display.  The  results  (Figure  15)  showed  a  positive 
versus  negative  priming  effect  in  reaction  time  which  reached  a  maximum  around  180 
msec  duration  of  the  prime  and  decreased  afterwards.  This  result  supported  the  notion  of 
an  early  2-D  match  but  still  leaves  many  points  to  be  examined  before  the  theory  can  be 
accepted. 

In  the  first  control  experiment,  the  unfilled  prime  and  the  filled  test  were  from 
different  faces.  We  found  no 
difference  between  positive  and 
negative  reaction  times  in  this  case 
showing  that  the  effect  was  not 
due  simply  to  some  alerting  signal 
generated  by  the  contours.  We 
were  still  concerned  about  the 
possible  role  of  conscious 
processing  of  the  contours, 
however,  some  of  which  could  be 
identified  as  a  face  in  the  contour 
versions.  An  additional 
experiment  therefore  evaluated  the 
ease  of  recognition  of  the  contour 
versions.  They  were  presented 
either  upright  or  upside-down  for 
180  msec,  followed  by  a  mask. 


Duration  of  prime  presentation  (msec) 
Fig.  15.  Response  time  results  for  the  positive  and 
negative  contrast  tests  as  a  function  of  the  duration 
of  the  prime  presentation.  The  prime  facilitates  the 
response  to  positive  contrast  tests  more  than  that  to 
negative  tests  and  this  difference  is  at  a  maximum 
at  180  msec  duration. 
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Fig.  14.  The  prime  could  be  followed 
by  either  a  positive  or  negative  contrast  test 
and  the  observer’s  task  was  to  report  rapidly 
which  was  presented. 
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The  observers  had  to  indicate  the  orientation  of  the  face.  We  used  these  data  to  rank  the 
contour  versions  of  the  faces  in  terms  of  difficulty  of  recognition  and  then  compared  the 
ranking  for  each  face  to  the  priming  effect  (negative  minus  positive  RTs  at  180  msec) 
found  for  that  face  in  the  first  experiment.  If  consciously  recognizable  features  of  the 
contour  versions  were  producing  the  priming  effect,  then  the  easiest  faces  to  recognize 
should  also  show  the  largest  priming.  The  correlation,  however,  was  negligible  (-0.06). 
We  also  compiled  the  reaction  times  from  the  original  experiment  for  the  most  difficult 
third  of  the  faces.  The  average  error  rate  in  identifying  up  versus  down  was  35%  for  the 
32  contour  faces  with  the  highest  error  rates  —  chance  performance  was  50%.  The 
average  error  rate  was  only  5%  for  the  64  faces  having  the  fewest  errors.  Despite  these 
differences  in  recognition  performance,  the  difficult  faces  showed  a  similar  pattern  and 
magnitude  of  priming  as  the  easy-to-recognize  faces  in  the  original  experiment.  Given 
these  results,  we  are  confident  that  conscious  processing  of  the  contours  is  not  mediating 
the  priming  effect. 

Our  final  concern  was  that  the  contours  were  simply  giving  advance  information 
about  the  location  of  the  contours  in  the  filled  image.  Our  experiment  to  assess  this 
possibility  is  described  in  the  Research  Methods  section.  Takeo  Watanabe  has  returned  to 
our  lab  for  the  summer  where  we  are  collaborating  on  this  extension  of  the  experiments. 
Specifically,  we  will  use  partial  contours  of  the  image  as  primes.  In  one  condition,  the 
contours  will  be  from  cast  shadow  borders  and  in  the  other  they  will  be  from  the  most 
informative  external  and  internal  object  contours. 
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